Recommendations with IBM

In this notebook, I will be putting the recommendation skills I have acquired so far to use on a real data from the IBM Watson Studio platform.

The table of contents illustrate the different methods for recommendations that can be used for different situations.

Table of Contents

I. Exploratory Data Analysis
II. Rank Based Recommendations
III. User-User Based Collaborative Filtering
IV. Matrix Factorization
V. Extras & Concluding

Let's start by importing the necessary libraries and reading in the data.

The goal is to recommend articles to real users. By real users I mean, users with email addresses. In our dataset, that corresponds to where an entry in the column field labelled email is not missing. Since we would like to recommend articles to real users, we may want to remove all missing data, precisely rows in dataset df with missing data in the email column.

There are $17$ missing data in the email column of the dataset df.

The complete dataframe $17$ rows with missing data in the email column is provided in the next cell below.

Among the $13$ articles with ids [268.0, 20.0, 62.0, 162.0, 224.0, 415.0, 647.0, 846.0, 961.0, 965.0, 1016.0, 1174.0, 1393.0], the article with the id 268.0 has interacted mostly among the users with missing emails.

I will remove all the missing data in the dataframe df.

Part I : Exploratory Data Analysis

In the cells below, I will provide some insight into the descriptive statistics of the data.

1. What is the distribution of how many articles a user interacts with in the dataset? Provide a visual and descriptive statistics to assist with giving a look at the number of times each user interacts with an article.

2. Explore and remove duplicate articles from the df_content dataframe.

3. Find the following:

a. The number of unique articles that have an interaction with a user.

b. The number of unique articles in the dataset (whether they have any interactions or not).

c. The number of unique users in the dataset. (excluding null values)

d. The number of user-article interactions in the dataset.

4. Let us find the most viewed article_id, as well as how often it was viewed.

After talking to the company leaders, the email_mapper function was deemed a reasonable way to map users to ids. There were a small number of null values, and it was found that all of these null values likely belonged to a single user (which is how they are stored using the function below).

Part II: Rank-Based Recommendations

Unlike in the earlier lessons, we don't actually have ratings for whether a user liked an article or not. We only know that a user has interacted with an article. In these cases, the popularity of an article can really only be based on how often an article was interacted with.

1. The function below returns the top n articles ordered with most interactions on the top.

Part III: User-User Based Collaborative Filtering

1. The function below reformats the df dataframe to be shaped with users as the rows and articles as the columns, with the following specifications:

2. The function below takes a user_id and provide an ordered list of the most similar users to that user (from most similar to least similar). The returned result does not contain the provided user_id, since we already know that each user is similar to him/herself. Note that, because the results for each user here are binary, it (perhaps) makes sense to compute similarity as the dot product of two users.

3. Now that I have a function that provides the most similar users to each user, I want to use these users to find articles I can recommend. The functions below return the articles I would recommend to each user.

4. Let us improve the consistency of the user_user_recs function from above.

5. Let's test our function to find the some similar users to users 1 and 131.

6. Suppose there is a new user on the platform, what articles would we recommend to him to read?

If there is a new user on the platform, I will recommend to him the top 10 most read articles.

7. Let's use our existing functions, to provide the top 10 recommended articles for the a new user.

Part IV: Matrix Factorization

In this part of the notebook, I will use matrix factorization to make article recommendations to the users on the IBM Watson Studio platform using the data given in the dataframe user_item.

The outputs of the two cells above indicates that the user_item_matrix is extremely sparse. The zeros almost everywhere. In particular, $99.08\%$ of the entries in the user_item_matrix is zero, whiles the remaining $0.92\%$ are ones. Thus, there are no missing values in user_item_matrix dataframe.

2. Let's use the Singular Value Decomposition from numpy on the user_item_matrix.

Note that, because user_item_matrix does not have any missing value, the np.linalg.svd function runs without any errors. If it had missing values, then we will have to resort the funk-svd algorithm.

3. Now for the tricky part, how do we choose the number of latent features to use? The code in the next cell below, indicates that increasing the number of latent features, decreases the error rate of making predictions for 1 and 0 values in the user_item_matrix dataframe. That gives us an idea of how the accuracy improves as we increase the number of latent features.

4. From the above, we can't really be sure how many features to use, because simply having a better way to predict the 1's and 0's of the matrix doesn't exactly give us an indication of if we are able to make good recommendations. Instead, we might split our dataset into a training and test set of data, as shown in the cell below.

In the subsequent cells below, I will use the code from question 3 to understand the impact on accuracy of the training and test sets of data with different numbers of latent features. Using the split below:

5. Now I use the user_item_train dataset from above to find U, S, and V transpose using SVD. Then I find the subset of rows in the user_item_test dataset that I can predict using this matrix decomposition with different numbers of latent features to see how many features makes sense to keep based on the accuracy on the test data. Note that, this will require combining what I did in questions 2 - 4 of this section.

Let us explore how well SVD works towards making predictions for recommendations on the test data.

Let's see how well we can use the training decomposition to predict on test data

6. Given the circumstances of my results, I discuss what I will do to determine if the recommendations I make with any of the above recommendation systems are an improvement to how users currently find articles.

The highly sparse nature of the user_item_matrix illustrates a highly unbalanced interaction between the users and the articles. That explains why a higher proportion of $99.08\%$ of the entries in the user_item_matrix is zero, whiles the remaining $0.92\%$ of the entries are ones. Consequently, over-fitting is very likely to occur because the model will get reasonably good accuracy in predicting the absence of interaction between the users and the articles. That could be the reason why the accuracy score is much higher than the ${F_{1}}$-score on the test data. To reduce the over-fitting, one can reduce the number of latent features, increase the majority class, use cross-validation, or get more data. To reduce the over-fitting, one can reduce the number of latent features, increase the majority class, use cross-validation, or get more data. The plot of the ${F_{1}}$-scores of both the training and testing data against the number of latent features indicates that decreasing the number of latent features improves the ${F_{1}}$-scores for the test data. However, the improvement is not substantial. More precisely, with a number of latest features as 90, the highest ${F_{1}}$-score recorded was $13.11\%$. Thus, one may have to resort to the other suggested methods mentioned earlier.

After splitting the data into training and testing data, one can observe that only $20$ users were in both of them. Thus, the test results might not be an accurate representation of how well the recommendations are performing. As an antidote, one may want to seek more data.

The given metric that records the user-article interactions does not provide more information. For example, it is hard to determine whether users fail to interact with articles because they did not like the articles, or they did not get the opportunity to read them, or for some other reason. A better metric I will suggest is for users to rate the articles they read. However, there is always a higher possibility that users might like articles they view. Therefore, the low ${F_{1}}$-score for the test data does not necessarily imply that the recommendations were below average.

Finally, since the platform is online, one can also test the performance of the recommendations through an A/B test. Namely, half of the users be assigned to the experimental group, whiles the other half to the old recommendation system as the control group. In the experimental design, one could request that users rate the articles they read. In addition, the experiment could internally generate the average reading time that users spend in reading an article. These two metrics could perhaps help to make better recommendations than the old recommendation system.